Hidden in Plain Sight

نویسندگان

  • Navneet Potti
  • James B. Wendt
  • Qi Zhao
  • Sandeep Tata
  • Marc Najork
چکیده

A vast majority of the emails received by people today are machinegenerated by businesses communicating with consumers. While some emails originate as a result of a transaction (e.g., hotel or restaurant reservation confirmations, online purchase receipts, shipping notifications, etc.), a large fraction are commercial emails promoting an offer (a special sale, free shipping, available for a limited time, etc.). The sheer number of these promotional emails makes it difficult for users to read all these emails and decide which ones are actually interesting and actionable. In this paper, we tackle the problem of extracting information from commercial emails promoting an offer to the user. This information enables an email platform to build several new experiences that can unlock the value in these emails without the user having to navigate and read all of them. For instance, we can highlight offers that are expiring soon, or display a notification when there’s an unexpired offer from a merchant if your phone recognizes that you are at that merchant’s store. A key challenge in extracting information from such commercial emails is that they are often image-rich and contain very little text. Training a machine learning (ML) model on a rendered image-rich email and applying it to each incoming email can be prohibitively expensive. In this paper, we describe a cost-effective approach for extracting signals from both the text and image content of commercial emails in the context of Gmail, an email platform that serves over a billion users around the world. The key insight is to leverage the template structure of emails, and use off-the-shelf OCR techniques to obtain the text from images to augment the existing text features offline. Compared to a text-only approach, we show that we are able to identify 9.12% more email templates corresponding to ~5% more emails being identified as offers. Interestingly, our analysis shows that this 5% improvement in coverage is across the board, irrespective of whether the emails were sent by large merchants or small local merchants, allowing us to deliver an improved experience for everyone. ∗Work done while at Google. This paper is published under the Creative Commons Attribution-NonCommercialNoDerivs 4.0 International (CC BY-NC-ND 4.0) license. Authors reserve their rights to disseminate the work on their personal and corporate Web sites with the appropriate attribution. WWW 2018, April 23–27, 2018, Lyon, France © 2018 IW3C2 (International World Wide Web Conference Committee), published under Creative Commons CC BY-NC-ND 4.0 License. ACM ISBN 978-1-4503-5639-8/18/04. https://doi.org/10.1145/3178876.3186167 CCS CONCEPTS • Information systems→ Email; Wrappers (data mining); • Applied computing→ Optical character recognition;

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hidden in Plain Sight: Untapped Riches of Meso-Level Entrepreneurship Mechanisms

Entrepreneurial action is embedded within a variety of complex social structures, not all of which can be as easily defined or measured as macro-institutional or micro-individual characteristics, but collectively hold rich insights into the actual causal mechanisms influencing action. To address this problem, we call upon researchers to broaden their levels of analysis and direct their focus to...

متن کامل

Evaluation of land subsidence in Kashmar-Bardaskan plain, NE Iran

The development of agriculture and industry and the increase of population in countries with arid to semi-arid climates have led to more harvesting of groundwater resources and as a result land subsidence in different parts of the worlds. Decades of groundwater overexploitation in the Kashmar-Bardaskan plain in the north-east of Iran has resulted substantial land subsidence in this plain. The p...

متن کامل

QR Code Steganography

QR codes, also known as matrix codes, are basically two dimensional barcodes embedded with data that can be decoded quickly for information. In this work, we present a novel use of QR codes. We show that QR codes can be used for covert communication using steganography. We also show in complete detail how to build QR code symbols with a hidden payload and how to extract this hidden information ...

متن کامل

Bio-urban design and the Hidden Rules of Nature

There was a turning point in each period of time which human has discovered a new sight about the world and nature order in a way, and then has presented this relation by numeral, artful and industrial language. Biourbanism focuses on the urban organism, considering it as a hyper complex system, according to its internal and external dynamics and their mutual interactions. Nowadays when it is t...

متن کامل

Groundwater level simulation using artificial neural network: a case study from Aghili plain, urban area of Gotvand, south-west Iran

In this paper, the Artificial Neural Network (ANN) approach is applied for forecasting groundwater level fluctuation in Aghili plain,southwest Iran. An optimal design is completed for the two hidden layers with four different algorithms: gradient descent withmomentum (GDM), levenberg marquardt (LM), resilient back propagation (RP), and scaled conjugate gradient (SCG). Rain,evaporation, relative...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018